ENH: Supporting REPEATED schema for list types #60

txomon · 2017-06-22T17:13:45Z

Basically, at the moment if there are lists inside a column, we have no way to put that data in GBQ. This enables REPEAT mode in a simple way.

codecov-io · 2017-06-22T17:17:39Z

Codecov Report

Merging #60 into master will decrease coverage by 45.14%.
The diff coverage is 44.44%.

@@             Coverage Diff             @@
##           master      #60       +/-   ##
===========================================
- Coverage   73.44%   28.29%   -45.15%     
===========================================
  Files           4        4               
  Lines        1540     1548        +8     
===========================================
- Hits         1131      438      -693     
- Misses        409     1110      +701

Impacted Files	Coverage Δ
pandas_gbq/tests/test_gbq.py	`27.89% <20%> (-54.99%)`	⬇️
pandas_gbq/gbq.py	`19.62% <75%> (-55.71%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 852b2a3...30c4248. Read the comment docs.

jreback · 2017-06-22T17:47:10Z

tests!

txomon · 2017-06-30T08:09:24Z

I don't get what if failing, the test is there but doesn't look like it's being executed.... 🤔

max-sixty · 2017-06-30T17:20:16Z

FWIW it's not 'pandantic' to put lists in a dataframe, and IIRC if they're on an index you'll face issues with a lot of pandas functionality

We approach this with either:

separate dataframes uploaded to BQ, and then run a BQ query to merge them into a nested format in a single table
stack / unstack, with a BQ query to merge columns

txomon · 2017-07-03T09:59:37Z

@MaximilianR I guessed it is not, but is something we use, thankfully not too often, but still need.

Should we put a warning or something to warn about a list schema being generated?
Can you tell me why the test isn't being executed?

max-sixty · 2017-07-03T17:02:10Z

What makes you think it's not being executed?

parthea · 2017-07-11T04:04:45Z

Thanks @txomon! I think #25 may help us provide support for BigQuery repeated fields. I'd like to keep this PR open just in case #25 doesn't provide this functionality.

Regarding the integration tests being skipped, some integration tests will be skipped if a BigQuery project id is not set. I think the tests should fail if a project id is not set. I've created #72 to track this improvement.

Follow these steps to run the BigQuery integration tests on Travis:
https://pandas-gbq.readthedocs.io/en/latest/contributing.html#running-google-bigquery-integration-tests

tswast · 2017-07-11T16:41:34Z

Yeah, #25 may help. This is also something to make sure we handle when we switch from streaming writes to bulk upload #7.

max-sixty · 2018-04-03T22:51:02Z

Anyone know whether this is still an issue?

tswast · 2018-04-03T22:58:53Z

A similar issue was fixed for read_gbq in #134.

I think this PR would probably be useful if we wanted to support list types in to_gbq, but the fix would be more involved. (Would need to switch to a different file type than CSV for load jobs, maybe Parquet?)

max-sixty · 2018-04-03T23:43:12Z

Right, and I think then pandas is not the right tool to be using for writing nested items - lists are badly supported if at all (I'm doubtful that it can write to nested parquet files, for example, so we'd have to write that part too).

We've thought about something that would go from xarray (i.e. 3+ dimensions) to a nested format to BQ, but even then there are difficulties (which dimension becomes nested? How can you create the JSON / dict in a memory-efficient way in python?)

I would vote to close, but v open if there are any creative ideas

max-sixty · 2018-08-22T19:07:26Z

Closing as stale, but open to a solution for writing repeated fields

ENH: Supporting REPEATED schema for list types

03b201b

txomon added 2 commits June 30, 2017 08:57

Adding a test to check repeated fields

79f46fe

Fixing linting

30c4248

parthea requested a review from tswast July 11, 2017 03:45

tswast approved these changes Jul 11, 2017

View reviewed changes

max-sixty closed this Aug 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Supporting REPEATED schema for list types #60

ENH: Supporting REPEATED schema for list types #60

Uh oh!

txomon commented Jun 22, 2017

Uh oh!

codecov-io commented Jun 22, 2017 •

edited

Loading

Uh oh!

jreback commented Jun 22, 2017

Uh oh!

txomon commented Jun 30, 2017

Uh oh!

max-sixty commented Jun 30, 2017 •

edited

Loading

Uh oh!

txomon commented Jul 3, 2017

Uh oh!

max-sixty commented Jul 3, 2017

Uh oh!

parthea commented Jul 11, 2017

Uh oh!

tswast commented Jul 11, 2017

Uh oh!

max-sixty commented Apr 3, 2018

Uh oh!

tswast commented Apr 3, 2018

Uh oh!

max-sixty commented Apr 3, 2018

Uh oh!

max-sixty commented Aug 22, 2018

Uh oh!

Uh oh!

ENH: Supporting REPEATED schema for list types #60

ENH: Supporting REPEATED schema for list types #60

Uh oh!

Conversation

txomon commented Jun 22, 2017

Uh oh!

codecov-io commented Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented Jun 22, 2017

Uh oh!

txomon commented Jun 30, 2017

Uh oh!

max-sixty commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

txomon commented Jul 3, 2017

Uh oh!

max-sixty commented Jul 3, 2017

Uh oh!

parthea commented Jul 11, 2017

Uh oh!

tswast commented Jul 11, 2017

Uh oh!

max-sixty commented Apr 3, 2018

Uh oh!

tswast commented Apr 3, 2018

Uh oh!

max-sixty commented Apr 3, 2018

Uh oh!

max-sixty commented Aug 22, 2018

Uh oh!

Uh oh!

codecov-io commented Jun 22, 2017 •

edited

Loading

max-sixty commented Jun 30, 2017 •

edited

Loading